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Abstract 

Let  Pi,...,Pk  be  k  >  3  given  normal  populations  with  unknown  means  6\ 
and  a  common  known  variance  cr2.  Let  X\, . . . ,  AT*  be  the  sample  means  of  k  independent 
samples  of  sizes  nlt . . .  ,n*  from  these  populations.  To  find  the  population  with  the  largest 
mean,  one  usually  applies  the  natural  rule  dN ,  which  selects  in  terms  of  the  largest  sample 
mean. 

In  this  paper,  the  performance  of  this  rule  is  studied  under  0—1  loss.  It  is  shown 
that  dN  is  minimax  if  and  only  if  rij  =  . . .  =  n *>.  dN  is  seen  to  perform  weakly  whenever 
the  parameters  0i,...,9k  are  close  together.  Several  alternative  selection  rules  are  derived 
in  a  Bayesian  approach  which  seem  to  be  reasonable  competitors  to  dN ,  worth  comparing 
with  dN  in  a  future  simulation  study. 
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1.  INTRODUCTION 

Let  Pi , . . . ,  Pk  be  k  >3  given  normal  populations  with  unknown  means  Q\ , . . . ,  Ok  E  R, 
and  a  common  known  variance  a2  >  0.  Suppose  we  want  to  find  the  population  with  the 
largest  mean,  where  independent  samples  of  sizes  nj, . . . ,  n*  from  P\,...,Pk  are  available 
with  sample  means  X\, . . . ,  Xk,  respectively. 

The  natural  decision  rule  dN ,  which  selects  that  population  which  is  associated  with 
the  largest  sample  mean,  has  been  studied  by  many  authors  since  it  was  introduced  in  the 
pioneering  paper  of  Bechhofer  (1954).  It  was  found  that  it  is  the  uniformly  best  permu¬ 
tation  invariant  procedure  if  the  sample  sizes  nj,..., n k  are  all  equal.  The  most  general 
version  of  this  so-called  “Bahadur-Eaton-Goodman-Lehmann  Theorem”  is  presented  in 
Gupta  and  Miescke  (1984),  where  the  risk  function  of  multi-  stage  selection  rules  with 
screening  is  studied  under  a  permutation  invariant  loss  structure. 

The  situation  changes  drastically  when  the  assumption  of  equal  sample  sizes  is 
dropped.  Besides  being  asymptotically  consistent  when  the  sample  sizes  tend  to  infinity, 
no  optimum  property  of  the  natural  rule  dN  is  known  so  far.  On  the  contrary,  Lam  and 
Chiu  (1976),  and  more  generally  Tong  and  Wetzell  (1979),  have  brought  to  light  quite 
pathological  behavior  of  the  probability  of  a  correct  selection  under  dN  ,P(CS\dN),  say.  If 
0i>  •  •  •  ,  are  sufficiently  close  together  and  if  . . . ,  Ok- 1  <  Ok,  then  its  value  is  strictly 

decreasing  in  n*. 

It  should  be  noted  that  technically  there  will  be  no  great  changes  if  we  assume  that 
Pu-'-,Pk  have  different  but  known  variances.  However,  we  feel  that  the  chosen  model 
provides  a  better  motivation  for  our  considerations.  Nevertheless,  our  analysis  will  be 
based  on  k  independent  random  variables  Xi  ~  N($i,pi),i  =  1, . . . ,  k,  where  pi,...,Pk  are 
known,  and  can  be  thus  applied  to  the  more  general  case,  too. 

Whenever  comparisons  with  a  control  are  incorporated  into  the  problem,  difficulties 
caused  by  heteroscedasticity  can  be  overcome  more  easily.  This  has  been  done  for  example 
by  Miescke  (1981)  and  Gupta  and  Miescke  (1985).  However,  the  transition  to  the  corre¬ 
sponding  problem  without  a  control,  as  it  is  described  in  Miescke  (1979),  cannot  be  made 
in  the  given  situation. 


1 


Although  some  work  has  been  done  already  to  solve  the  given  selection  problem,  no 
modification  or  substitute  of  dN  has  been  found  so  far  which  can  be  considered  to  be  better 
in  some  reasonable  sense.  Some  insight  into  the  structure  of  the  problem  has  been  gained 
by  Bechhofer  and  Tamhane  (1986),  who  looked  for  the  best  allocations  of  observations, 
subject  to  ni  +  ...  +  rik  being  fixed,  to  maximize  P(CS\dN )  for  the  case  of  known  but 
unequal  variances. 

The  problem  under  concern,  although  being  rarely  mentioned  in  the  literature,  e.  g. 
Berger  (1983)  and  Miescke  (1984),  is  well  known  to  the  statistical  community.  A  recent 
simulation  study  by  Zaher  and  Heiny  (1984),  where  dN  is  compared  with  two  similar  rules 
which  are  based  on  medians  and  rank-sums,  respectively,  under  nx  =  . . .  =  n*  but  different 
variances  of  Pu . . . ,  Pk,  corroborates  this  fact.  It  should  be  pointed  out  that  the  problem 
of  selecting  a  subset  for  unequal  sample  sizes  (or  unequal  variances)  has  been  studied  by 
Gupta  and  Huang  (1976). 

In  the  next  section,  the  minimax  approach  is  used  to  detect  weak  points  in  the  per¬ 
formance  of  dN .  However,  no  alternative  decision  rule  can  be  found  in  this  approach. 
Therefore,  Bayes  rules  with  respect  to  various  priors  are  studied  in  the  subsequent  sec¬ 
tions  to  find  reasonable  modifications  of  or  alternatives  to  dN .  Similar  techniques  have 
been  used  previously  by  Ehrman,  Krieger  and  Miescke  (1986)  in  the  related  subset  se¬ 
lection  problem.  Several  promising  candidates  to  be  used  as  alternatives  to  dN  will  be 
derived  and  proposed  in  this  paper.  Comparisons  of  the  performance  characeristics  of  all 
rules  considered  in  a  simulation  study  is  planned  to  be  made  in  the  future. 

2.  MINIMAXITY 

The  problem,  which  will  be  considered  throughout  this  paper,  can  be  formulated  in 
a  concise  form  as  follows.  Given  are  independent  random  variables  X,-  ~  iV(<?,-, p,),i  = 
where  pi,...,p*  are  fixed  known  positive  numbers.  To  be  found  is  the  index 
i'o,  say,  with  0,o  =  max{0i, . . . ,  0fc},  which  we  may  assume  to  be  unique  for  the  sake 
of  simplicity.  Under  the  0  -  1  loss  function,  the  probability  of  a  correct  selection  and 
the  risk  function  of  a  (possibly  randomized)  decision  rule  d  at  a  parameter  configuration 
Q.  =  (0i>  •  •  •  >  &k)  £  are  connected  through 


Pi{CS\d)  =  l-R(0,d).  (1) 

Thus  all  decision  theoretic  formulations  in  terms  of  risk  can  be  translated  immediately 
into  the  “P(CS)-language”  used  in  the  area  of  ranking  and  selection. 

We  begin  our  study  with  minimax  considerations  since  this  will  lead  us  directly  to 
the  weak  points  in  the  performance  of  the  natural  decision  rule  d^ ,  which  selects  in  terms 
of  the  largest  value  among  Xj , . . .  ,  X*.  We  shall  see  that  the  performance  of  dN  becomes 
unsatisfactory  whenever  the  parameters  0U. . . ,  0k  are  lying  closely  together.  This  comple¬ 
ments  the  findings  of  Lam  and  Chiu  (1976)  and  of  Tong  and  Wetzell  (1979)  and  indicates 
that  dN  cannot  be  considered  to  be  a  universally  acceptable  decision  rule. 

Let  <p  and  $  denote  the  density  and  cumulative  distribution  function,  respectively,  of 
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N{ 0, 1)  in  the  sequel.  The  first  result  is  a  reformulation  of  the  findings  by  Tong  and  Wetzell 
(1979),  presented  however  in  a  form  which  is  more  suitable  for  our  further  considerations, 
and  proved  differently. 

LEMMA  1.  The  function 


k-l 


. . .  ,7fc-i)  =  [  TT  $('-fiz)<p(z)d2 

•/r  .=,1 

it  is  strictly  increasing  in  7,-  >  0,  t  =  1, ...  ,1c  —  1. 


(2) 


Proof:  The  partial  derivative  of  H  with  respect  to  is  equal  to 


r  k  — 1 

/  H  ${liz)z<p{iiz)<p(z)dz. 
jR  i= 2 


(3) 


After  combining  the  two  ^-functions,  and  then  integrating  by  parts,  it  can  be  seen  that 
(3)  equals 


(27t)  *(1  +  7i)  1  /  M(tv)<p(w)dw,  (4) 

J  R 

where 

d  k~1 

MH  =  II  ^(C1  e  R,  (5) 

i-2 

is  clearly  positive  over  the  whole  real  line. 

As  an  immediate  consequence,  we  can  state  the  following. 

COROLLARY  1.  The  function 

.  k-l 

G(cu...,ak)=  Yi$[<rrlz)°k1(p{°klz)dz  (6) 

*/R  »=i 

is  strictly  decreasing  in  <74, . . .  and  strictly  increasing  in  ak- 

Now  we  can  state  the  main  result  of  this  section.  The  points  of  weakness  of  dN ,  which 
we  have  mentioned  before,  will  become  visible  in  the  course  of  the  proof. 

THEOREM  1.  For  the  given  problem,  the  natural  decision  rule  dN  is  minimax  if  and 
only  if  pi  =  P2  =  . . .  =  pk-  Moreover,  the  minimax-  value  of  the  problem  is  1  —  l/k. 
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Proof:  Consider  the  no-data  rule  d°,  which  selects  every  t  G  {1,. . .  ,fc}  with  the  same 
probability  1  /k.  It  has  clearly  the  constant  risk  1  —  1/k. 

The  risk  function  of  dN  can  be  represented  in  a  convenient  way  by  using  the  following 
notation.  For  any  vector  a  G  R*,  let  a^]  <  ...  <  a^j  denote  the  ordered  coordinates. 
Moreover,  whenever  £  =  {9u...,0k)  and  X  =  {Xu...,Xk)  are  considered  jointly  in 
the  sequel,  let  subscript  (j)  =  t,  if  0,-  =  =  1, ...,k.  As  mentioned  before,  we 

may  assume  that  no  ties  occur  among  the  0,’s.  This  simplifies  our  considerations  without 
losing  generality.  Introducing  generic  random  variables  iVj , . . . ,  Nk,  which  are  independent 
standard  normals,  we  can  represent  the  risk  of  dN  at  £  G  Rfc  by 


R(i,dN)  =  l-Pe_{X{k)=X[k]} 

=  1  -  P{0[i :]  +  pfi)Ni  <  e[k]  +  pfk)Nk,i  <  k}.  (7) 

And  since  this  is  an  increasing  function  of  0[tj,  i  <  k,  we  conclude  that 


supR(£,  dN) 


-i-inf  /  §{{P{k)IP(x)Y2  z)<p{z)dz 

~  JR t=i 

f  k 

=  1_/  T[$(.{P[i]IP\i\)*z),P{z)dz, 

J  R  « _o 


(8) 


where  the  second  equation  is  a  consequence  of  Lemma  1.  Moreover,  from  Lemma  1  we  see 
that 


sup  R(£,dN)  >  1  -  1/k,  (9) 

with  equality  holding  if  and  only  if  Pi  =  P2  =  . . .  =  pk. 

Thus,  to  complete  the  proof,  we  have  to  show  that  the  minimax  value  of  the  given 
problem  is  equal  to  1  -  1/k.  Since  the  no-data  rule  d°  has  constant  risk  1-1/k,  it  suffices 
to  find  a  sequence  of  priors  such  that  the  sequence  of  associated  Bayes  risks  tends  to  this 
value.  The  following  class  of  conjugate  priors  will  be  seen  to  contain  such  a  sequence. 

Let,  apriori,  be  independent  random  variables  with  0,-  ~  N(pif G 

^’r*  ^  =  !»•••>  k.  Then,  as  it  is  well  known,  aposteriori,  given  X.  =  £.>  ©i,...,0* 

are  independent  normals  with  expectations  (piPi  +  rt-x,-)/(p,-  +  r,)  and  variances  riP,/(p,-  + 
r»)>*  ~  1 » -  -  • ,  Aj,  respectively.  And  marginally,  Xi,...Xk  are  independent  normals  with 
expectations  p,-  ajid  variances  p,-  +  r,-,  t ’  =  1 respectively. 
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At  X  =  x,  the  Bayes  rule  dB ,  say,  minimizes  the  posterior  expected  loss,  and  it  yields 
the  posterior  risk 


_min  (1  -  P {0,-  =  0[fcj  |X  =  i» 

=  1  -  max  /  IT  (a,-*,-  +  (1  -  a,-)*** 

t=l,...,k  JR  .  j*-. 

-  OLjXj  -  (1  -  a.j)nj  +  (oiiPi)? z))<p(z)dz,  (10) 

where  a,  =  ra/(p„  +  ra),s  =  1 

For  the  special  case  of  ra  =  l/n  and  pa  =  0 ,s  =  1 . k,  we  see  that  (10)  tends  to 

1  —  l/fc,  if  n  tends  to  infinity.  And  since  the  marginal  densities  of  Xi, . . . ,  Xk  are  bounded 
by  a  constant,  a  routine  application  of  Lebesgue’s  dominated  convergence  theorem  shows 
that  the  sequence  of  Bayes  risks  tend  in  fact  to  1  —  1  /k,  if  n  tends  to  infinity.  This  completes 
the  proof  of  the  theorem. 

From  (8)  in  the  last  proof,  we  can  see  now  clearly  what  might  go  wrong  in  the  per¬ 
formance  of  the  natural  rule  dN .  If  the  parameters  0j, . . . ,  Ok  are  close  together,  and  if  the 
variance  p(*)  of  Aqjt),  which  is  associated  with  #[*],  is  relatively  small  in  comparison  with 
P(,) , i  #  k,  then  the  rule  dN  performs  “worse  than  at  random”. 

One  natural  way  out  of  this  dilemma,  and  to  possibly  save  the  reputation  of  dN ,  is 
to  look  at  the  average  risk  over  all  k\  permutations  of  a  given  parameter  vector  0,  rather 
than  taking  the  risk  function  as  a  measure  of  performance.  The  average  risk  of  a  rule  d  at 
6  £  Rk  would  be 

R(e,d)  =  (i/k<)Y,R  MM,  (li) 

7T 

where  n r(0)  =  (0ir(i) >  •••■> 0* (fe))>  and  the  summation  being  taken  over  k\  permutations  7r  of 
(1,2, . . . ,  k).  The  average  risk  reflects  perhaps  better  the  prevailing  attitude  of  researchers 
in  the  area  of  ranking  and  selection,  which  states  that  “the  pairing  between  the  0[s  and 
the  P^s  is  completely  unknown.” 

It  can  be  shown  that  with  respect  to  the  average  risk  R,dN  is  in  fact  minimax.  This 
result,  however,  is  not  of  great  support  for  dN ,  since  it  shares  this  property  with  a  large 
class  of  monotone  decision  rules,  as  we  shall  see  in  the  next  theorem. 

THEOREM  2.  Let  dh  be  the  decision  rule  which  selects  in  terms  of  the  largest 
h{(Xi),i  =  1  ,...,&,  where  h\, ...  ,hk  are  strictly  increasing  functions.  Thendh  is  minimax 
with  respect  to  the  average  risk  R ,  and  the  minimax  value  of  the  problem  is  again  1  —  1  fk. 

Proof:  For  every  decision  rule  d ,  and  for  every  permutation  symmetric  prior  with 
density  p  w.  r.  t.  the  Lebesgue  measure  on  Rfc,  the  Bayes  risk  satisfies 

r(p,d)=f  R(9id)p(6)d6 
J  Rfc 
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(12) 


{£!*!<...<«*} 


R (0,  d)  p (0)  d  0  <  supR (0 ,  d) . 

e 


Since  the  sequence  of  priors  chosen  below  of  (10)  consists  of  such  symmetric  priors,  it 
follows  similarly  as  in  the  proof  of  Theorem  1  that  the  no-data  rule  d°  is  minimax  w.  r.  t. 
R,  and  that  the  minimax  value  is  again  1  —  1  /k.  It  remains  thus  to  show  that  every  rule 
of  type  dh  has  supremal  average  risk  1  —  1  jk.  Let  dh  be  any  such  rule,  and  let  0  G  Rfc 
be  fixed,  where  we  may  assume  without  loss  of  generality  that  0\  <  02  <...<  Ok  holds. 
Then  similarly  as  before  in  (7), 

(!/*!)][>*<$) 

7T 

=  ( i/u )  E  p,  (0)  (k)  (-Xfl— 1  (A:))  ^  ^7T-1  (j)  (^7T-1  (;))  5  J  ^  (13) 

7 r 

“  UAO  y^P{hw-*(k)(0k  +  Pn-i(k)Nk)  >  hx +  0K-l(j)Nj),j  <  /c}, 

7 r 

where  Ni , . . . ,  Nk  are  independent  standard  normals,  and  fii  =  pf ,  %  =  1, . . . ,  k.  A  lower 
bound  of  (13)  is  attained  if  all  0i, . . . ,  Ok- 1  are  put  equal  to  0fc,  because  of  the  monotonicity 
of  /ii, . . . , lik-  Doing  so,  and  then  splitting  the  sum  into  a  suitable  double  sum,  we  see  that 
the  lower  bound  is 

k 

(1/*!)E  E  P{h(h  +  PiNt) 

i—l  7T,7r(i)  =  fc 

>  K-'{j)[6k  +  Px~i(j)Nj)>j  <  fc} 

k 

=  (1  /k)Y^P{hi{0k  +  fitNi)  >  hj{0k  +  PjNj)J  #  t>  =  1  /*.  (14) 

»  =  1 

Thus,  in  view  of  (l),  the  supremal  average  risk  of  dh  is  equal  to  1  —  l/k,  and  the  proof  of 
the  theorem  is  completed. 

Our  conclusions  of  this  section  are  (l)  that  the  natural  rule  dN  cannot  be  accepted 
as  a  universally  good  decision  rule,  and  (2)  that  the  minimax  principle  does  not  lead  to 
a  convincing  alternative  to  dN .  Therefore  it  seems  to  be  reasonable  to  study  the  form 
of  Bayes  rules  with  respect  to  various  priors  in  more  detail,  in  the  hope  to  learn  more 
about  how  such  good  decision  rules  act  in  different  situations.  Our  main  interest  thereby 
will  focus  on  permutation  symmetric  (exchangeable)  and  on  conjugate  priors.  This  will  be 
done  in  the  subsequent  sections. 

3.  BAYES  RULES  FOR  EXCHANGEABLE  PRIORS 

Permutation  invariant  (exchangeable)  priors  appear  to  be  the  suitable  priors  to  adopt 
if  there  is  no  initial  knowledge  available  as  to  how  the  ordered  parameters  0m, . . . ,  0[/t]  are 
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associated  with  the  populations  P\> . . . ,  P fc.  They  reflect  the  prior  opinion  that  each  of  the 
k  populations  may  equally  likely  be  the  one  which  has  the  largest  mean. 

Since  we  are  considering  Bayes  rules,  we  may  restrict  considerations  to  nonrandom- 
ized  decision  rules  d,  which  can  be  represented  simply  by  measurable  functions  d :  Rk  — ► 
{l,  2, . . . ,  fc},  where  d(x)  ~  i  means  that  at  X_  —  d  selects  population  Pi9  i  =  1, . . . ,  fc,  x  G 
Rk. 


Now,  for  any  prior  r,  after  X  =  x  has  been  observed,  the  Bayes  rule  selects  that 
population  which  is  associated  with  the  smallest  posterior  expected  loss.  This  decision 
process  consists  thus  of  pairwise  comparisons  of  the  k  competing  posterior  risks.  Ignoring 
a  common  factor,  which  depends  on  x  and  p,  the  Bayes  rule  dB  can  be  written  as 

dB(x)  =  i  if  £(t|x)  =  max  6{j\x),  (15) 


where 


£(s|x)  =  f 
He 


{e\et—e^ } 


y=i 


i,s  =  1 


To  find  out  under  which  conditions  one  population  is  preferred  over  another  one  if  r 
is  symmetric,  let  us  compare  without  loss  of  generality  £(2|x)  and  £( l|x),  say,  to  keep  the 
notation  simple.  After  exchanging  the  variables  0\  and  62  in  the  integral  representation  of 
5{ 2|i),  and  some  standard  calculations,  we  see  that 


5{  2|*)-$(i|s0 


r  k 

=^[fc]}  ,=i 


(16) 


where 


M2,i  {x,$) 

=  exp{(6i  -  02)[{x2  -  ( 0 !  +  02)/2)/p2  -  [xx  -  {0X  +  02)/2)/Pl}}. 

Although  the  Bayes  rules  may  have  in  general  very  complicated  forms,  several  con¬ 
clusions  can  be  drawn  from  (16).  The  first  one  is 

THEOREM  3.  Under  a  symmetric  prior  r,  suppose  that  for  two  populations  Pa  and 
Pb,  say,  the  variances  pa  and  Pb  are  equal.  Then  the  Bayes  rule  relatively  ranks  Pa  and  Pb 
m  the  same  way  as  the  natural  rule  dN ,  namely  according  to  the  larger  of  the  two  values 
xa  and  Xb,  no  matter  what  x»,i  ^  a,b,  might  actually  be. 

Another  finding  is  the  following.  Suppose  we  know  a  constant  lower  (upper)  bound  d 
to  0X,..  .,0k-  Then  if  pa  >  (<)pb  and  if  (x0  —  d)/pa  >  ( xj,  —  d)/pb,  every  Bayes  rule  w.r.t. 
a  symmetric  prior  prefers  Pa  to  Pb. 
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If  the  prior  knowledge  asserts  that  the  parameters  0j,.. .  ,0k  are  in  a  slippage  config¬ 
uration  61  =  ...  =  0t_!  =  6i+i  =  ...  =  0k  =  S,  and  6{  =  6  +  A,  where  <5  e  R  and  A  >  0 
are  known,  and  where  apriori  each  i  €  {1, ...,/:}  may  be,  with  the  same  probability  l/k, 
the  index  of  the  slipped  population,  then  from  (16)  it  follows  that  the  Bayes  rule  is  given 
by 

ds,A(x)  =  i  if  (z,-  -  6  -  A/2)/pi  =  max  (zy  -  <5  -  A/2)/py.  (17) 

It  should  be  noted  that  it  is  quite  different  from  the  decision  rule  df,  say,  which  selects  in 
terms  of  the  smallest  p-value  of  the  best  1-sample  tests  for  H{  :  0,-  =  6  versus  K{  :  0,-  = 

<5  +  A,  and  thus  selects  in  terms  of  the  largest  (zt-  —  6) JpJ , *  =  l,...,k. 

Exchangeable  normal  priors  give  Bayes  rules  which  are  in  general  quite  complicated 
in  their  structure.  Although  we  know  that  the  Bayes  rule  is  determined  by 

dB(x)  =  i  if  P{0i  =  0(fc,  |X=z} 

=  max  P{0y  =  ©[fc]|X  =  z},  (18) 

J  — 

and  a  prior  .0  ~  N[p,A)  with  -X~|Q  =  0  ~  N (0,  E)  would  result  in  0|X  =  x  ~  N[x  — 
E(E  +  A)~l{x  —  p),(E-1  +  A-1)-1)  where,  marginally,  X  ~  N(p,  E  +  A),  there  is  not 
much  simplification  to  gain  if  we  assume  that  p  —  p0X,  and  A  =  al  +  61  1T,  where 
I  =  (1?  •  •  •  >  l)T»-f  is  the  k  x  k  identity  matrix,  po,b  6  R,a  >  0,  and  a  +  kb  >  0  to  have 
A  positive  definite,  even  if,  as  in  the  present  setting,  E  is  diagonal  with  diagonal  elements 
P !?••*)  Pk * 

One  limiting  case,  however,  the  noninformative  prior  case,  is  of  natural  interest  and 
leads  in  fact  to  an  interesting  decision  rule.  Suppose  we  are  in  the  situation  which  led  to 
(10),  but  now  letting  tend  to  infinity.  Then  the  generalized  Bayes  rule  d°° ,  say, 

can  be  seen  to  be  based,  formally,  on  0t-  ~  IV(z,,p,),  t  =  1, . . .  ,k,  independent,  at  X  —  x, 
and  to  be  given  by 

d°°(x}  =  i  if  #(*|z)  =  max  I/(j|z),  (19) 

where 

H(s\x)=  I  Y[${Pj^{xa~  Xj +P^z))<p[z)dz,  i,s  =  l,...,fc. 

One  interesting  feature  of  d°°  is  that  it  selects  in  terms  of  the  largest  variance  among 
Pi  i  i  Pk  j  whenever  zi,...,z*  are  lying  closely  together.  This  is  an  immediate  conse¬ 
quence  of  Lemma  1.  We  conclude  this  section  by  proposing  two  other  type  of  decision 
rules  which  seem  to  be  reasonable  alternatives  to  dN ,  worth  to  be  studied  in  more  detail 
in  the  future.  The  first  is  given  by 

^£(^)  =  dN(x),  if  x<0B,  and 

<ie(m)  =  t,  if  z  6  B  and  p,-  =  max  py,  (20) 
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where  B  C  is  an  area  where  the  coordinates  of  the  vectors  are  close  to  each  other,  e.  g. 
where  max.  j  |x,-  —  xy  \  <  e  for  some  e  >  0.  The  other  type  of  decision  rule  is  of  the  form 
(19),  where  U  is  replaced  by  say,  with 

${s\x)  =  (xe  -  x)/pa,  s  =  1, . ..  ,k,  (21) 

and  where  x  is  an  average  of  x1}...,Xfc,  e.  g.  the  weighted  average  with  weights 
Pi  X>  •  •  •  >Pjt 1- 


4.  BAYES  RULES  FOR  POSTERIORS  WITH  (DT). 

One  of  the  basic  facts  which  lead  to  the  “Bahadur  ei.  al.  Theorem”  mentioned  in 
the  introduction  is  the  following.  Suppose  that  at  every  Y  =  x,  the  posterior  depends 
on  x  through  g(x)  =  (g i  (x), . . . , firfc(x)),  where  <7i, . . .  ,gk  are  given  functions.  Then  if  the 
posterior  is  (DT)  in  (9,g(x)),  and  if  the  loss  function  is  permutation  invariant  and  favors 
selection  of  larger  parameters,  then  the  posterior  risk  acts  like  the  loss  function  where  g(x) 
plays  the  role  of  9.  For  details  see  Gupta  and  Miescke  (1984).  Thus  the  Bayes  rule  selects 
here  in  terms  of  the  largest  <7,(x),t  =  1, ...  ,1k. 

Under  a  normal  prior  ~  N(p,,A ),  as  considered  after  the  statement  of  (18),  the 
posterior  is  (DT)  if  and  only  if  the  covariance  matrix  associated  with  it  is  of  the  form 

(E"1  +  A"1)"1  =  a2[(l  -  p)I  +  p\  1T],  (22) 

where  a  £  R  and  —  (A:  —  l)-1<p<l  are  necessary  and  sufficient  for  this  matrix  to  be 
positive  definite. 

If  (22)  holds  true,  the  conditional  expectation  of  0j  given  X_  =  x,  can  be  seen  to  be 
£{0LX  =  x}  =  £+7(^)1+ 

+  a2(l-p)((xi  -  p.i)/pi,...,{xk-  p,k)/pk),  (23) 

where  '■/(x)  is  a  certain  function  which  is,  as  we  shall  see,  of  no  relevance  for  the  Bayes 
rule.  Namely,  if  we  set  gi(x)  =  E{Q, \X_  =  x},t  =  l,...,fc,x  6  Rk,  then  the  posterior  is 
(DT)  in  (£.,p(x)),  and  the  Bayes  rule  is  given  by 

dB(x)  =  t  if  J(t|x)  =  max  J(j|x),  (24) 

where  J(s|x)  =  p9  +  a2(l  -  p)(xe  -  p3)/p9,  i,s  =  1, . . . , k. 


Of  special  interest  hereby  is  the  case  of  p.\  =  . . .  =  p,k  =  (i,  where  the  Bayes  rule 
say,  assumes  the  simple  form 


dM(x)  =  *  if  (xt-  -  p)/pi  =  max  (xy  -  p)/pj , 

j=i . k 


(25) 
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1  —  which  is  almost  the  same  as  that  one  of  the  Bayes  rule  for  the  slippage 

situation,  given  by  (17). 

The  interesting  feature  of  the  rule  d°° ,  discussed  just  after  the  statement  of  (19),  has 
an  analog  in  the  rules  given  by  (25)  and  (17).  If  there  are  (almost)  tied  x,’s,  which  are 
smaller  (larger)  than  p  or  6  +  A/2,  respectively,  then  the  Bayes  rules  prefer  the  population 
with  the  larger  (smaller)  variance. 

The  choice  of  a  prior,  which  results  in  a  posterior  with  the  (DT)-  property  and  ulti¬ 
mately  in  a  Bayes  rule  of  simple  structure,  is  made  not  only  for  convenience.  It  has  also  a 
statistical  justification  since  it  leads  to  a  posterior  situation  where  the  information  about 
the  unknown  parameters  is  equally  and  thus  fairly  balanced.  This  can  be  seen 

perhaps  most  easily  in  the  case  where  p  =  0  in  (22).  Then  A  is  diagonal  with  diagonal 
elements  rx,...,r^,  say,  which  brings  us  back  to  the  situation  considered  at  (10),  where 
now  we  have 

p-'+r7'=a-\  i=l,...,k.  (26) 

Calling,  as  usual,  the  inverse  of  a  variance  “precision,”  the  sum  of  the  prior  precision  and 
the  sampling  precision  is  constant  across  the  k  populations,  if  (26)  holds. 

Returning  to  the  original  form  of  the  problem,  as  it  was  presented  in  the  introduction, 
we  can  state  the  following  interesting  fact.  Suppose  that  the  prior  is  known,  as  it  should 
be,  before  the  sampling  is  performed.  Suppose  further  that  the  sample  sizes  from  the 
populations  Pi,...,Pk  can  be  chosen  in  such  a  way  that  (22)  holds,  which  means  that 
in  case  of  a  diagonal  A,  the  condition  (26)  is  fulfilled.  Then  the  information  about  the 
unknown  parameters  0i, ...  ,0*  is  fairly  balanced,  and  the  Bayes  decision  rule  assumes  the 
simple  form  given  by  (24)  or  (25),  respectively.  It  should  be  pointed  out  clearly,  that  in 
this  case  the  Bayes  rule  is  the  same  under  every  loss  which  is  permutation  invariant  and 
favors  selection  of  larger  parameters. 

5.  CONCLUDING  REMARKS. 

It  can  be  seen  easily  that  the  natural  rule  dN  is  an  extended  Bayes  rule.  Since  if 
apriori,  0X, . . . ,  0*  axe  i.  i.  d.  iV(0,n),  then  the  Bayes  risk  of  dN  with  respect  to  this  prior 
tends  to  0  if  n  tends  to  infinity.  This  is  not  surprising,  as  we  know  that  the  performance 
of  dN  is  only  unsatisfactory  if  the  parameters  0i,...,0jt  are  close  together.  On  the  other 
hand,  we  saw  that  d ^  cannot  be  the  Bayes  rule  for  any  normal  prior  0  N(p,A). 

We  could  not  settle,  however,  the  interesting  question  of  whether  or  not  dN  is  admis¬ 
sible  under  the  0  —  1  loss  function  on  the  parameter  space  Cl  =  {6_  £  |^[fc] is  unique}. 
The  restriction  of  parameters  to  Cl  is  made  for  simplicity,  and  does  not  cause  any  loss  of 
generality.  For  other  loss  functions,  however,  this  restriction  may  not  simplify  matters  and 
may  not  be  made,  as  e.  g.  in  the  example  given  below. 

The  0  —  1  loss  function  was  adopted  in  our  study  because  it  connects  the  risk  function 
in  a  natural  way  through  (l)  with  the  probability  of  a  correct  selection,  which  is  the 
performance  characteristic  of  decision  rules  considered  primarily  in  the  area  of  ranking 
and  selection.  With  respect  to  other  loss  functions,  however,  the  natural  rule  dN  may  in 
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fact  be  a  proper  Bayes  rule  and  admissible  on  Rfc,  as  the  following  example  demonstrates. 
Assume  that  the  loss  for  selecting  population  P{  at  parameters  Bx,  . . . ,  0k  is  of  the  form 

=  6{k)  ~  Oi,  t  =  l,...,k,  0eR*.  (27) 


Then  the  Bayes  rule  at  X  =  x  is  given  by 

<T(x)  =  *  if  £{©,|X  =  z}  =  max  E{QJ\X  =  x},  i=l,...,k.  (28) 


Therefore,  if  apriori,  ©,•  ~  iV(/rt-,r,),t  =  1  ,...,k,  independent,  the  Bayes  rule  at 
X  —  x  turns  out  to  be 


d6(x)  =  :  if  .M(t|x)  =  max  M(j|x),  (29) 

where  M(s|x)  =  (pa/x,  +  raxa)J (p«  +  ra),  s  =  1 ,k.  And  it  can  be  seen  now  that  db  is 
the  natural  rule  dN  if  =  fx  and  r,-  =  c  p,-,  t  =  1, . . . ,  k,  for  some  fixed  n  G  R  and  c  >  0. 

The  admissibility  of  all  Bayes  rules  considered  in  this  paper,  those  under  0  —  1  loss  on 
n,  as  well  as  those  under  the  loss  function  (27)  on  Rfc,  follows  from  the  fact  that  the  risk 
function  of  every  selection  rule  in  these  problems  is  continuous  in  6.  This  is  an  immediate 
consequence  of  the  well  known  fact  that  the  expectation  of  a  bounded  function  under  a 
multi-parameter  exponential  family  is  continuous  in  these  parameters 
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